05:00
Data Science Workflows
Review
Discussion
Break
Minimal R Package
source: R4DS
Project management spans all aspects of the data science cycle.
I asked you to: Find 3 data science projects on Github and explore how they organise their work.
In groups of 2-3, each pitch one of the projects you found.
05:00
I asked you to: Create your own project directory (or directories) for this course and its assignments.
https://www.menti.com/alrej2iedgtr
Find a person you haven’t spoken to today and explain your reasoning.
05:00
I asked you to: write a function to calculate the rolling arithmetic mean of a numeric vector.
Pair up with a third person that you haven’t spoken to yet today.
Discuss your thought process and compare code.
What did you have to consider when writing this function?
Report back: One decision that you made differently or a decision that you made the same, but implemented differently.
12:00
window_length is even?
is.integer()? Nope.xxwindow_widthNULL?rolling_mean <- function(x, window_width, ...){
# -----Input Checks ----------------------------------------------------------
# Check that x is a vector with numerical interpretation
stopifnot(is.logical(x) | is.integer(x) | is.double(x) | is.complex(x))
stopifnot(length(x) > 0)
# Check window_width is an odd, positive integer
stopifnot(length(window_width) == 1)
stopifnot(window_width %% 1 == 0)
stopifnot((window_width / 2) %% 1 != 0)
stopifnot(window_width > 0)
# ----- Function Body --------------------------------------------------------
# number of values left and right to include in each mean
half_width <- floor(window_width / 2)
x_padded <- pad_with_NAs(x, n_left = half_width, n_right = half_width)
evaluation_locations <- seq_along(x) + half_width
output <- rep(NA, length(x))
for (index in evaluation_locations) {
# Extract relevant values from x_padded
indices_in_window <- seq(index - half_width, index + half_width, by = 1)
values_in_window <- x_padded[indices_in_window]
# Calculate and store mean
output[index - half_width] <- mean(values_in_window, ...)
}
return(output)
}#' Calculate the rolling mean of a vector
#'
#' @param x Vector of values that can be interpreted numerically.
#' @param window_width The number of values included in each mean calculation. Should be an odd, positive integer.
#' @param ... Additional arguments to pass to the mean() function call.
#'
#' @return A vector of rolling mean values of the same length as `x`.
#' @export
#'
#' @examples
#'
#' rolling_mean(x = 1:5, window_width = 3)
#' rolling_mean(x = 1:5, window_width = 5)
#' rolling_mean(x = 1:5, window_width = 7)
#' rolling_mean(x = c(TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE), window_width = 3)
#'
rolling_mean <- function(x, window_width, ...){}05:00
Programming and package development are huge topics.
In 1 hour we will cover the easiest 20%, which will cover 80% of everything you ever need.
You will need:
{devtools}{usethis}{testthat}{rOxygen2}For the “hardcore” folks you can do all of this by hand, but it is an absolute pain. These tools were developed for a reason.
Once per package
This should be:
Naming things in hard.
zvtools broman ralphrColourBrewer, PrettyCols, wesandersonlubridate, sp, spatstat, ismevhttps://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
https://kbroman.org/pkg_primer/
https://www.pipinghotdata.com/posts/2020-10-25-your-first-r-package-in-1-hour/#acknowledgements
https://r-pkgs.org/
Effective Data Science: Workflows - Organising Your Code - Zak Varty